Rmarkdown files consist of blocks or chunks of code written in R and text written in markdown. You can run the code chunk by chunk or by knitting the entire document at once.
In the following chunk we load packages we will need and set preferences for knitting the document. Anything behind a “#” symbol is “commented code” and will be ignored by the compiler.
Today we will be working with Stocks of specified dairy products. This data is stored in a CODR (Census something something??) table
Let’s get to know the data a bit better. Let’s checkout the columns and the range of values.
## REF_DATE GEO DGUID UOM
## Length:37812 Length:37812 Length:37812 Length:37812
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## UOM_ID SCALAR_FACTOR SCALAR_ID VECTOR
## Length:37812 Length:37812 Length:37812 Length:37812
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## COORDINATE VALUE STATUS SYMBOL
## Length:37812 Min. : 0 Length:37812 Length:37812
## Class :character 1st Qu.: 527 Class :character Class :character
## Mode :character Median : 2219 Mode :character Mode :character
## Mean : 6979
## 3rd Qu.: 9419
## Max. :74242
## NA's :11588
## TERMINATED DECIMALS GeoUID Hierarchy for GEO
## Length:37812 Length:37812 Length:37812 Length:37812
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Classification Code for Stocks Hierarchy for Stocks
## Length:37812 Length:37812
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## Classification Code for Commodity Hierarchy for Commodity val_norm
## Length:37812 Length:37812 Min. : 0
## Class :character Class :character 1st Qu.: 527
## Mode :character Mode :character Median : 2219
## Mean : 6979
## 3rd Qu.: 9419
## Max. :74242
## NA's :11588
## Date Stocks
## Min. :1970-01-01 Total stocks :19560
## 1st Qu.:1995-04-01 Manufactures and government stocks:11070
## Median :2004-08-01 Retail and wholesale stocks : 7182
## Mean :2002-12-05
## 3rd Qu.:2013-07-01
## Max. :2022-06-01
##
## Commodity
## Creamery butter :11100
## Cheddar cheese :10638
## Variety cheese : 6234
## Whey butter : 1314
## Process cheese : 1314
## Whole milk powder: 816
## (Other) : 6396
What are some things you notice about the data?
In short, “cleaning data” means to prepare it for analysis. Removing empty values, converting values to the same format or otherwise manipulating your data to improve the quality and uniformity are all examples of cleaning your data.
How might we need to clean this data?
Now let’s drop some columns we aren’t really interested in. Using the above as an example, drop “SCALAR_FACTOR”, “SCALAR_ID”, and “DECIMALS”. Note that there are many different equivalent ways to drop columns.
We’re now a bit more familiar with our data, but it can be difficult to parse from a table! Let’s create a plot so we can get a better idea of what’s going on. What are some things we might want to find out and what is the best way to visualize them?
Since we have date information, it makes sense to make a time series plot! To keep things simple, let’s focus on creamery butter in Canada over time.
Using the code above, make a new plot showing the stocks of another dairy product in another region over time. Be sure to update the title as appropriate.
We might also be curious about the breakdown of type of dairy product stocks. Let’s visualize this in a pie chart. To keep it simple, let’s focus on Canada and the most recent data, so June 2022.
Make another pie chart for another time period.
Next we need to transform our data a bit further. Let’s compute the most numerous commodity
Write a short summary of what you have learned. You may wish to include some plots!
Optimization: Data.table is a structure that is faster than the built-in data.frame structure. Can you rewrite the code to make use of this structure instead?
Collaboration: You could put your code on Gitlab and have your colleagues provide feedback or even contribute to your code.
Interactivity: RShiny is a tool to create interactive dashboard etc…